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I. ENGINEERING EVOLUTIONARY SEARCH 

Evolutionary search refers to a class of stochastic op- 
timization techniques — loosely based on processes be- 
lieved to operate in biological evolution — that have been 
applied successfully to a variety of different problems; 
see, for example, Refs. @g.|,|,|ij,[i]]]2(| and references 
therein. Unfortunately, the mechanisms constraining and 
driving the dynamics of evolutionary search on a given 



problem are not well understood. In mathematical terms 
evolutionary search algorithms are nonlinear population- 
based stochastic dynamical systems. The complicated 
dynamics exhibited by such systems has been appreci- 
ated for decades in the field of mathematical population 
genetics. For example, the effects on evolutionary behav- 
ior of the rate of genetic variation, the population size, 
and the function to be optimized typically cannot be ana- 
lyzed separately; there are strong, nonlinear interactions 
between them. These complications make an empirical 
approach to the question of whether and how to use evo- 
lutionary search problematic. The lack of a unified the- 
ory has rendered the literature largely anecdotal and of 
limited generality. The work presented here continues 
an attempt to unify and extend theoretical work that 
has been done in the areas of evolutionary search the- 
ory, molecular evolution theory, and mathematical pop- 
ulation genetics. The goal is to obtain a more general 
and quantitative understanding of the emergent mecha- 
nisms that control the dynamics of evolutionary search 
and other population-based dynamical systems. 

Our approach takes a structural view of the search 
space and solves the population dynamics as it is con- 
strained by a general architecture for epochal evolution — 
a class of population dynamics in which long periods of 
stasis are punctuated by rapid innovations. Based on a 
phenotypically induced decomposition, the genome space 
is broken into strongly and weakly connected sets. From 
this we motivate several simplifying assumptions that 
lead to the class of fitness functions and genetic operators 
we analyze. Stated in the simplest possible terms, all of 
the resulting population dynamical behavior derives from 
the interplay of the architecture, the infinite-population 
nonlinear dynamics, and the stochasticity arising from 
finite-population sampling. 

One might object that important details of real bio- 
logical evolution, on the one hand, or of alternative evo- 
lutionary search algorithms, on the other hand, are not 
described by the resulting class of evolutionary dynam- 
ical systems. Our response is simple: One must start 
somewhere. The bottom line is that the results and their 
predictive power justify the approach. Moreover, along 
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the way we come to appreciate a number of fundamental 
trade-offs and basic mechanisms that drive and inhibit 
evolutionary search. 

Our results show that a detailed dynamical under- 
standing, announced in Ref. |3rjl and expanded in Ref. 
|57j, can be turned to a very practical advantage. Specif- 
ically, we determine how to set population size and mu- 
tation rate to reach, in the fewest steps, the global opti- 
mum in a wide class of fitness functions. In other words, 
our objective is to minimize the total number of fitness 
function evaluations as a function of evolutionary search 
parameters. 

Our analysis provides several insights that are useful 
knowledge for engineers even in much more complicated 
optimization problems (and, for that matter, for the the- 
ory of evolutionary dynamics in biology) . Using these, in 
a sequel we draw some conclusions about those optimiza- 
tion problems for which population-based search meth- 
ods, such as genetic algorithms and genetic programming 
(to mention only two examples), are appropriate. 



II. LANDSCAPE ARCHITECTURE 



this evolutionary behavior? The evolutionary biologist 
Wright introduced the notion of "adaptive landscapes" 
to describe the (local) stochastic adaptation of popula- 
tions to themselves and to environmental constraints E|| . 
This geographical metaphor has had a powerful influ- 
ence on thinking about natural and artificial evolution- 
ary processes. The basic picture is that evolutionary dy- 
namics stochastically crawls along a surface determined, 
perhaps dynamically, by the fitness of individuals, mov- 
ing to peaks and very occasionally hopping across fitness 
"valleys" to nearby, and hopefully higher fitness, peaks. 

More recently, it has been assumed that the typical 
fitness functions of combinatorial optimization and bio- 
logical evolution can be modeled as "rugged landscapes" 
These are functions with wildly fluctuating fit- 
nesses even at the smallest scales of single-point muta- 
tions. The result is that these "landscapes" possess a 
large number of local optima. 



The fitness functions characteristic of problems that 
evolutionary search or (say) simulated annealing are be- 
ing used for in practice are very complicated, almost by 
definition. On the one hand, detailed knowledge of the 
fitness function implies that one does not need to run an 
optimization method to find high fitness solutions. On 
the other hand, assuming no structure at all leads to a 
completely random fitness function for which it is known 
that any optimization algorithm performs as well on av- 
erage as random search pi]] . Not surprisingly, reality is 
a middle ground. 

Therefore, our strategy to understand the workings of 
evolutionary search algorithms is to assume some struc- 
ture in the fitness function that is germane to search pop- 
ulation dynamics and to assume that, beyond this, there 
is no other structure. That is, apart from the structure 
we impose, the fitness function is as unstructured as can 
be. 

There is a concomitant and compelling biological mo- 
tivation for our choice of architecture. This is the 
common occurrence in natural evolutionary systems of 
"punctuated equilibria" — a process first introduced to 
describe sudden morphological changes in the paleonto- 
logical record 19 1. Similar behavior has been recently 
studied experimentally in bacterial colonies [ p"3| and in 
simulations of the evolution of transfer-RNA secondary 
structure |l5| . It has been argued, moreover, that punc- 
tuated dynamics occurs at both genotypic and pheno- 
typic levels ||. This class of behavior appears robust 
enough to also occur in artificial evolutionary systems, 
such as evolving cellular automata p|]28) and populations 
of competing programs 

How are we to think of the mechanisms that cause 




FIG. 1. Subbasin and portal architecture underlying 
epochal evolution. Roughly speaking, a population diffuses in 
the subbasins (large sets) until a portal (a tube) to a higher 
fitness subbasin is found. 

At the same time an increasing appreciation has devel- 
oped, in marked contrast to this rugged landscape view, 
that there are substantial degeneracies in the genotype- 
to-phenotype and the phenotype-to-fitness mappings. 
The crucial role played by these degeneracies has found 
important applications in molecular evolution; e.g. see 
Ref. |Tq] . Up to small fluctuations, when these degenera- 
cies are operating the number of distinct fitness values 
in the landscape is often much smaller than the number 
of genotypes. Moreover, due to the high dimensionality 
of these genotype spaces, sets of genotypes with approxi- 
mately equal fitness tend to form components in genotype 
space that are connected by single mutational steps. Fi- 
nally, due to intrinsic or even exogenous effects (e.g. en- 
vironmental) there simply may not exist a "fitness" value 
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for each genotype. Fluctuations can induce variation in 
fitness such that genotypes with neighboring fitness val- 
ues are not distinct at the level of selection. A similar 
effect can also arise when a fast rate of change does not 
allow subtle distinctions in fitness to become manifest. 

When these biological facts are taken into account we 
end up with an alternative view to both Wright's "adap- 
tive" landscapes and the more recent "rugged" land- 
scapes. That is, the fitness landscape decomposes into 
a set of "neutral networks" of approximately isofitness 
genotypes that are entangled with each other in a com- 
plicated and largely unstructured fashion; see Fig. 
Within each neutral network selection is effectively dis- 
abled and neutral evolution dominates. Some of the first 
steps in understanding the consequences of neutral evo- 
lution (in single neutral networks) were taken by Kimura 
in the 1960's using stochastic process analyzes adopted 
from statistical physics |25|. Despite the early progress 
in neutral evolution, a number of fundamental problems 
remain |l0| ]. Although we will analyze neutral evolution 
in the following, we also emphasize the global architec- 
tural structure that connects the neutral networks and 
drives and constrains epochal evolutionary search. 

This intuitive view of biologically plausible fitness 
landscapes — as a relatively small number of connected 
neutral nets — is the one that we adopt in the following 
analysis. We formalize it by making several more specific 
assumptions about the fitness function. First, we assume 
that there are N + 1 different neutral nets, with fitnesses 
1,2, . . . , N+1. Second, we assume that the higher the fit- 
ness, the smaller the isofitness neutral net volume. That 
is, there are fewer strings of high fitness than low. More 
specifically, we assume that the proportion of genotype 
space that is occupied by strings of fitness n scales as 
2~ Kn , where K is a measure of the rate at which the 
proportion of higher fitness strings decreases with fitness 
level. Finally, we assume that strings with fitness n + 1 
can be reached by a single point mutation from strings 
with fitness n. Specifically we assume that the set of 
strings with fitness n + 1 is a subspace of the set of strings 
with fitness n. The resulting architecture is a modified 
version of the general subbasin-portal structure of Fig. [j] 
and is illustrated in Fig. [|; after Ref. . 

Why assume that strings of higher fitness are nested 
inside those of lower fitness? We believe that this as- 
sumption is consonant, by definition, with the very idea 
of using evolutionary search for optimization. Imagine, 
on the contrary, that strings of fitness n+1 are more 
likely to be close to strings with fitness n — 1 than to 
those of fitness n. It then seems strange to have selection 
preferably replicate strings of fitness n over strings with 
fitness 71—1. One result is that this leads to an increased 
(ineffective) search effort centered around strings of fit- 
ness n. Therefore, designing a search algorithm to select 
the current best strings implicitly assumes that strings 
of higher fitness tend to be found close to strings of the 
current best fitness. 



B is the space of random strings 



Portal 




FIG. 2. The dimensional hierarchy of subbasins and por- 
tals for the Royal Staircase fitness functions. 

In this way we shift our view away from the geographic 
metaphor of evolutionary search "crawling" along a fixed 
and static (smooth or rugged) "landscape" to that of a 
diffusion process constrained by the subbasin-portal ar- 
chitecture induced by degeneracies in the genotype-to- 
phenotype and phenotype-to-fitness mappings. More- 
over, as will become more apparent, our approach is not 
just a shift in architecture, but it also focuses on the 
dynamics of populations as they move through the sub- 
basins to find portals to higher fitness. A side benefit 
is that it does not limit itself to evolutionary processes 
for which a potential function exists; as the landscape 
analyses do. 



III. THE ROYAL STAIRCASE FITNESS 
FUNCTION 

Under the above assumptions, the class of fitness func- 
tions, referred to as the "Royal Staircase" , delineated is 
equivalent to the following specification: 

1. Genomes are specified by binary strings s = 
S1S2 ■ ■ ■ SL,Si <E {0, 1}, of length L = NK. 

2. Reading the genome from left to right, the number 
I(s) of consecutive Is is counted. 

3. The fitness f(s) of string s with / consecutive ones, 
followed by a zero, is f(s) = 1 + \I{s)/K\ . The fit- 
ness is thus an integer between 1 and N + 1 . 

4. The (single) global optimum is the genome s = 1 L ; 
namely, the string of all Is. 

From this it is easy to see that we have chosen N (con- 
secutive) sets of K bits to represent the different fitness 
classes. These sets we call "blocks" . The first block con- 
sists of the first K bits on the left, i.e. s\ ■ ■ ■ sk- The 
second block consists of bits sk+i • • ■ S2K and so on. For 
each of these blocks there is one "aligned" configuration 
consisting of K Is and 1 K — 1 "unaligned" configurations. 
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If the first block is unaligned, the string obtains fitness 
1. If the first block is aligned and the second unaligned, 
it obtains fitness 2. If the first two blocks are aligned 
and the third unaligned, it obtains fitness 3, and so on 
up to the globally optimal string with all aligned blocks 
and fitness N + 1. 

Without affecting the evolutionary dynamics or the un- 
derlying architecture of genotype space, we could have 
chosen a different "aligned" block than the all-Is config- 
uration. In fact, we could have chosen different aligned 
configurations for the different blocks and still not af- 
fected the dynamics. Furthermore, since we will not be 
analyzing crossover, we could have chosen the locations of 
the bits of each block to be anywhere in the genome with- 
out affecting the dynamics. The only constraint, other 
than the block's ordering, is that we have N disjoint sets 
of K bits. 

Notice further that the proportion p n of strings with 
fitness n is given by: 



p n = 2-^"-!) (1 - 2 



- K ) 



(1) 



The net result is that this fitness function implements the 
intuitive idea that increasing fitness is obtained by setting 
more and more bits correctly. One can only set correct bit 
values in sets of K bits at a time and in blocks from left 
to right. (Due to the modularity of the subbasin-portal 
architecture, and of the resulting theory we present be- 
low, the restriction to uniform block size also could be 
lifted.) A genome's fitness is proportional to the number 
of blocks it has set correctly. This realizes our view of the 
underlying architecture as a set of isofitness genomes that 
occur in nested neutral networks of smaller and smaller 
volume; as shown in Fig. ^[ 



IV. THE GENETIC ALGORITHM 

For our analysis of evolutionary search we have chosen 
a simplified form of a genetic algorithm (GA) that does 
not include crossover and that uses fitness-proportionate 
selection. The GA is defined by the following steps. 

1. Generate a population of M bit strings of length 
L = NK with uniform probability over the space 
of I/-bit strings. 

2. Evaluate the fitness of all strings in the population. 

3. Stop the algorithm, noting the generation number 
f pt, if a string with optimal fitness N + 1 occurs 
in the population. Else, proceed. 

4. Create a new population of M strings by select- 
ing, with replacement and in proportion to fitness, 
strings from the current population. 

5. Mutate each bit in each string of the new popula- 
tion with probability q. 



6. Go to step 2. 

The total number E of fitness function queries is E = 
Mt pt. We are interested in the average number E of 
queries per GA run required over a large number R of 
runs. Note that the average total amount of mutational 
information introduced into the populations during a sin- 
gle GA run is qNKMt opt . 

The main motivation for leaving out crossover is that 
this greatly simplifies our analysis. The benefit is that we 
can make detailed quantitative predictions of the GA's 
behavior. Moreover, we believe that, from the point of 
view of optimization, the addition of crossover to the ge- 
netic operators only marginally improves the efficiency of 
the search. We comment on this, which admittedly is at 
variance with common beliefs about evolutionary search, 
later on. Additional discussion and supporting evidence 
can be found in section 6.5 of Ref. |57) and in Ref. (28). 

Notice that our GA effectively has two parameters: the 
mutation rate q and the population size M. A given 
search problem is specified by the fitness function in 
terms of N and K. The central goal of the following 
analysis is to find those settings of M and q that mini- 
mize the number E of fitness function queries for given 
N and K. 



V. OBSERVED POPULATION DYNAMICS 

The typical dynamics of a population evolving on a 
landscape of connected neutral networks, such as defined 
above, alternates between long periods of stasis in the 
population's average fitness ("epochs") and sudden in- 
creases in the average fitness ("innovations"). (See, for 
example, Fig. 1 of Ref. j37|.) As was first pointed out 
in the context of molecular evolution in Ref. p3[ , the 
best individuals in the population diffuse over the neu- 
tral network ("subbasin") of isofitness genotypes until 
one of them discovers a connection ("portal") to a neu- 
tral network of even higher fitness. The fraction of indi- 
viduals on this new network then grows rapidly, reaching 
an equilibrium after which the new subset of most-fit in- 
dividuals diffuses again over the new neutral network. In 
addition to the increasing attention paid to this type of 
epochal evolution in the theoretical biology community 
p8| , p2 29 ,3j^(J, recently there has also been an increased 
interest by evolutionary search theorists fp||2t 



The GA just defined is the same as that studied in 
our earlier analyses ^,0. Also, the Royal Staircase fit- 
ness function defined above is very similar to the "Royal 
Road" fitness function that we used there. It should not 
come as a surprise, therefore, that qualitatively the GA's 
experimentally observed behavior is very similar to that 
reported in Refs. |36) and p7| |. Moreover, most of the 
theory developed there for epochal evolutionary dynam- 
ics carries over to the Royal Staircase class of fitness func- 
tions. 



4 



We now briefly recount the experimentally observed 
behavior of typical Royal Staircase GA runs. The reader 
is referred to Ref. Q for a detailed discussion of the dy- 



namical regimes this type of GA exhibits under a range 
of different parameter settings. 




2000 



200 



400 



600 



800 



1000 




M = 100 q = 8x 10' 



2500 



1000 



Generations 



Generations 



FIG. 3. Examples of the Royal Staircase GA population dynamics with different parameter settings. The four plots show 
best fitness in the population (upper lines) and average fitness in the population (lower lines) as a function of time, measured 
in generations. The fitness function and GA parameters are given in each plot. In each case we have chosen q and M in the 
neighborhood of their optimal settings (see later) for each of the four values of N and K. 



Figure || shows the GA's behavior with four different 
parameter settings. The vertical axes show the best fit- 
ness in the population (upper lines) and the average fit- 
ness in the population (lower lines) as a function of the 
number of generations. Each figure is produced from a 
single GA run. In all of these runs the average fitness (/) 
in the population goes through stepwise changes early in 
the run, alternating epochs of stasis with sudden inno- 
vations in fitness. Later in the run, the average fitness 
tends to have higher fluctuations. Notice also that (/) 
roughly tracks the epochal behavior of the best fitness in 
the population. Notice, too, that often the best fitness 
shows a series of innovations to higher fitness that are 
lost. Eventually these innovations "fixate" in the pop- 
ulation. Finally, for each of the four settings of N and 
K we have chosen the values of M and q such that the 



total number E of fitness function evaluations to reach 
the global optimum for the first time is roughly minimal. 
Thus, the four plots illustrate the GA's typical dynamics 
close to optimal (M, ^-parameter settings — the analysis 
for which begins in the next section. 

There is a large range, almost a factor of 10, in times 
to reach the global optimum across the runs. Thus, there 
can be a strong parameter dependence in search times. 
Moreover, the variance of the total number E of fitness 
function evaluations is the same size as the average E. 
Thus, there are large run-to-run variations in the time 
to reach the global optimum, even with the parameters 
held constant. This is true for all parameter settings with 
which we experimented, of which only a few are reported 
here. 

Figure |](a) plots the results of a GA run with N = 8 
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blocks of K = 8 bits each, a mutation rate of q — 0.006, 
and a population size of M = 200. During the epochs, 
the best fitness in the population jumps up and down sev- 
eral times before it finally jumps up and the new more- 
fit string stabilizes in the population. This transition 
is reflected in the average fitness also starting to move 
upward. In this particular run, it took the GA approxi- 
mately 3 x 10 5 fitness function evaluations to reach the 
global optimum for the first time. Over 250 runs the 
GA takes on average 5 x 10 5 fitness function evaluations 
to reach the global optimum for these parameters. The 
inherent large per-run variation means in this case that 
some runs take less than 10 5 function evaluations and 
that others take many more than 10 6 . 

Figure ||(b) plots a run with N = 6 blocks of length 
K = 6 bits, a mutation rate of q = 0.012, and a popula- 
tion size of M = 150. The GA reached the global opti- 
mum after approximately 3 x 10 fitness function evalu- 
ations. On average, the GA uses approximately 5 x 10 4 
fitness function evaluations to reach the global fitness 
optimum. Notice that somewhere after generation 500 
the global optimum is lost again from the population. 
It turns out that this is a typical feature of the GA's 
behavior for parameter settings close to those that give 
minimal E. The global fitness optimum often only occurs 
in relatively short bursts after which it is lost again from 
the population. Notice also that there is only a small 
difference in (/} depending whether the best fitness is 
either 6 or 7. 

Figure ^(c) shows a run for a small number (N = 4) 
of large (K — 10) blocks. The mutation rate is q = 0.01 
and the population size is again M = 150. As in all three 
other runs we see that the average fitness goes through 
epochs punctuated by rapid increases of average fitness. 
We also see that the best fitness in the population jumps 
up several times before the population fixates on a higher 
fitness. The GA takes about 2 x 10 5 fitness function eval- 
uations on average to reach the global optimum for these 
parameter settings. In this run, the GA just happened to 
have taken about 2.5 x 10 5 fitness function evaluations. 

Finally, Fig. ^(d) shows a run with a large number 
(N = 10) of smaller (K — 5) blocks. The mutation rate 
is q = 0.008 and the population size is M = 100. Notice 
that in this run, the best fitness in the population alter- 
nates several times between fitnesses 7, 8, and 9 before 
it reaches the global fitness optimum of 11. After it has 
reached the global optimum for several time steps, the 
global optimum disappears again and the best fitness in 
the population alternates between 9 and 10 from then 
on. It is notable that this intermittent behavior of the 
best fitness is barely discernible in the behavior of the 
average fitness. It appears to be lost in the "noise" of 
the average fitness fluctuations. The GA takes about 10 5 
fitness function evaluations on average at these param- 
eter settings to reach the global optimum; while in this 
particular run the GA took only 6 x 10 4 fitness function 
evaluations. 

Again, we stress that there are large fluctuations in the 



total number of fitness evaluations to reach the global op- 
timum between runs. One tentative conclusion is that, if 
one is only going to use a GA for a few runs on a spe- 
cific problem, there is a large range in parameter space 
for which the GA's performance is statistically equiva- 
lent. On the one hand, the large fluctuations in the GA's 
search dynamics make it hard to predict for a single run 
how long it is going to take to reach the global optimum. 
On the other hand, this prediction is largely insensitive to 
the precise parameter settings in a ball centered around 
the optimal parameter settings. Thus, the large fluctua- 
tions lead to a large "sweet spot" of GA parameters, but 
the GA does not reliably discover the global optimum 
within a fixed number of fitness function evaluations. 



VI. STATISTICAL DYNAMICS OF 
EVOLUTIONARY SEARCH 

In Refs. |36| and [|7| we developed the statistical dy- 
namics of genetic algorithms to analyze the behavioral 
regimes of a GA searching the Royal Road fitness func- 
tion. The analysis here builds on those results. Due 
to the strong similarities we will only briefly introduce 
this analytical approach to evolutionary dynamics. The 
reader is referred to Ref. |37|] for an extensive and detailed 
exposition. There the reader also will find a review of 
the alternative methodologies for GA theory developed 
by Vose and collaborators |30| , |38| , p9] | , by Prugell-Bennett, 
Rattray, and Shapiro |]3l|-|33|, and in mathematical pop- 
ulation genetics. 

From a microscopic point of view, the state of an 
evolving population is only fully described when a list 
S of all genotypes with their frequencies of occurrence 
in the population is given. The evolutionary dynamics 
is implemented via the conditional transition probabili- 
ties P(S'\S) that the population at the next generation 
will be the microscopic state S' . For any reasonable ge- 
netic representation, there will be an enormous number of 
these microscopic states S and their transition probabil- 
ities. This makes it almost impossible to quantitatively 
study the dynamics at this microscopic level. 

More practically, a full description of the dynamics on 
the level of microscopic states S is neither useful nor typi- 
cally of interest. One is much more likely to be concerned 
with relatively coarse statistics of the dynamics, such as 
the evolution of the best and average fitness in the pop- 
ulation or the waiting times for evolution to produce a 
string of a certain quality. The result is that quanti- 
tative mathematical analysis faces the task of finding a 
description of the evolutionary dynamics that is simple 
enough to be tractable numerically or analytically and 
that, moreover, facilitates the prediction the quantities 
of interest to a practitioner. 

With these issues in mind, we specify the state of the 
population at any time by some relatively small set of 
"macroscopic" variables. Since this set of variables inten- 
tionally ignores vast amounts of microscopic detail, it is 
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generally impossible to exactly describe the GA's dynam- 
ics in terms of these macroscopic variables. In order to 
acheive the benefits of a coarser description, however, we 
assume that given the state of the macroscopic variables, 
the population has equal probabilities to be in any of 
the microscopic states consistent with the specified state 
of the macroscopic variables. This "maximum entropy" 
assumption lies at the heart of statistical mechanics in 
physics. 

We assume in addition that in the limit of infinite pop- 
ulation size, the resulting equations of motion for the 
macroscopic variables become closed. That is, for in- 
finite populations, we assume that we can exactly pre- 
dict the state of the macroscopic variables at the next 
generation, given the present state of the macroscopic 
variables. This limit is analogous to the thermodynamic 
limit in statistical mechanics; the corresponding assump- 
tion is analogous to "self-averaging" of the dynamics in 
this limit. 

The key, and as yet unspecified step, in developing 
such a thermodynamic model of evolutionary dynamics 
is to find an appropriate set of macroscopic variables that 
satisfies the above assumptions. In practice this is diffi- 
cult. Ultimately, the suitability of a set of macroscopic 
variables has to be verified by comparing theoretical pre- 
dictions with experimental measurements. In choosing 
such a set of macroscopic variables one is guided by our 
knowledge of the fitness function and the search's genetic 
operators. 

To see how this choice is made imagine that strings 
can have only two possible values for fitness, Ja and fs- 
Assume also that under mutation all strings of type A 
are equally likely to turn into type-i? strings and that 
all strings of type B have equal probability to turn into 
strings of type A. In this situation, it is easy to see that 
we can take the macroscopic variables to be the relative 
proportions of A strings and B strings in the population. 
Any additional microscopic detail, such as the number 
of Is in the strings, is not required or relevant to the 
evolutionary dynamics. Neither selection nor mutation 
distinguish different strings within the sets A and B on 
the level of the proportions of A's and B's they produce 
in the next generation. 

Similarly, our approach describes the state of the pop- 
ulation at any time only by the distribution of fitness 
in the population. That is, we group strings into equiv- 
alence classes of equal fitness and assume that, on the 
level of the fitness distribution, the dynamics treats all 
strings within a fitness class as equal. At the macroscopic 
(fitness) level of the dynamics, we know that a string of 
fitness n has the first n — 1 blocks aligned and the nth 
block in one of the 2 K — 1 other unaligned configurations. 
The maximum entropy assumption specifies that for all 
strings of fitness n, the nth block is equally likely to be 
in any of the 2 K — 1 unaligned configurations and that 
each of the blocks n + 1 through N are equally likely to 
be in any of their possible 2 K block configurations. 

Various reasons suggest the maximum entropy approx- 



imation will not be valid in practice. For example, the 
fixation due to finite population size makes it hard to be- 
lieve that all unaligned block configurations in all strings 
are random and independent. For large populations, for- 
tunately, the assumption is compelling. In fact, in the 
limit of very large population sizes, typically M > 2 L , 
the GA's dynamics on the level of fitness distributions ac- 
curately captures the fitness distribution dynamics found 
experimentally |37[ . In any case, we will solve explicitly 
for the fitness distribution dynamics in the limit of infi- 
nite populations using our maximum entropy assumption 
and then show how this solution can be used to solve for 
the finite-population dynamics. 

The essence of our statistical dynamics approach then 
is to describe the population state at any time during 
a GA run by a relatively small number of macroscopic 
variables — variables that in the limit of infinite popula- 
tions self-consistently describe the dynamics at their own 
level. After obtaining this infinite population dynamics 
explicitly, we then use it to solve for the GA's dynamical 
behaviors with finite populations. 

Employing the maximum entropy principle and focus- 
ing on fitness distributions is also found in an alternative 
statistical mechanics approach to GA dynamics devel- 
oped by Prugel-Bennett, Rattray, and Shapiro [[u] p3j . 
In their approach, however, maximum entropy is as- 
sumed with respect to the ensemble of entire GA runs. 
Specifically in their analysis the average dynamics, av- 
eraged over many runs, of the first few cumulants of the 
fitness distribution are predicted theoretically. The re- 
sult is that almost all of the relevant behavior, e.g. as 
illustrated in Fig. ||| of the previous section, is averaged 
away. In contrast, our statistical dynamics approach ap- 
plies maximum entropy only to the population's current 
state, given its current fitness distribution. The result 
is that for finite populations we do not assume that the 
GA dynamics is self-averaging. That is, two runs, with 
equal fitness distributions P at time t, are not assumed 
to have the same future macroscopic behavior. They are 
assumed only to have the same probability distribution 
of possible futures. 



A. Generation Operator 

The macroscopic state of the population is determined 
by its fitness distribution, denoted by a vector P = 
{Pi, P 2 , . . . , P/v+i}, where the components < P n < 1 
are the proportion of individuals in the population with 
fitness n — 1,2, .... N + 1. We refer to P as the pheno- 
typic quasispecies, following its analog in molecular evo- 
lution theory Since P is a distribution, we have 
the normalization condition: 

Af+l 

E p « = L ( 2 ) 

n=l 
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The average fitness (/) of the population is given by: 

N+l 

(/) = nP n- (3) 

In the limit of infinite populations and under our max- 
imum entropy assumption, we can construct a genera- 
tion operator G that maps the current fitness distribu- 
tion P(t) deterministically into the fitness distribution 
P(t + 1) at the next time step; that is, 

P(t + l) = G[P(t)\. (4) 

The operator G consists of a selection operator S and 
a mutation operator M: 

G = MS. (5) 

The selection operator encodes the fitness-level effect of 
selection on the population; and the mutation operator, 
the fitness-level effect of mutation. Below we construct 
these operators for our GA and the Royal Staircase fit- 
ness function explicitly. For now we note that the infinite 
population dynamics can be obtained by iteratively ap- 
plying the operator G to the initial fitness distribution 
P(0). Thus, the macroscopic equations of motion are 
formally given by 

P(*)=G«[P(0)] . (6) 

Recalling Eq. ([!]) it is easy to see that the initial fitness 
distribution P(0) is given by: 

P n (0) = 2- K(n - 1) (f - 2- K ) , 1 < n < N , (7) 

and 

P N+1 (0) = 2- KN . (8) 

As shown in Refs. and p^j, despite G's nonlinearity, 
it can be linearized such that the tth iterate G^ can 
be directly obtained by solving for the eigenvalues and 
eigenvectors of G. 

For very large populations (M > 2 L ) the dynamics 
of the fitness distribution obtained from GA simulation 
experiments is accurately predicted by iteration of the 
operator G. It is noteworthy, though, that this "infi- 
nite" population dynamics is qualitatively very different 
from the behavior shown in Fig. ||. For large populations 
strings of all fitnesses are present in the initial population 
and the average fitness increases smoothly and monoton- 
ically to an asymptote over a small number of genera- 
tions. (This limit is tantamount to an exhaustive search, 
requiring as it does 0(2 L ) fitness function evaluations.) 
Despite this seeming lack of utility, in the next section we 
show how to use the infinite population dynamics given 
by G to describe the finite-population behavior. 



B. Finite Population Dynamics 



There are two important differences between the 
infinite-population dynamics and that with finite pop- 
ulations. The first is that with finite populations the 
components P n cannot take on arbitrary values between 
and 1. Since the number of individuals with fitness n 
in the population is necessarily integer, the values of P n 
are quantized to multiples of 1/M . The space of allowed 
finite population fitness distributions thus turns into a 
regular lattice in N + 1 dimensions with a lattice spac- 
ing of 1/M within the simplex specified by normalization 
(Eq. d|)). Second, the dynamics of the fitness distribu- 
tion is no longer deterministic. In general, we can only 
determine the conditional probabilities Pr[Q|P] that a 
certain fitness distribution P leads to another Q in the 
next generation. 

Fortunately, the probabilities Pr[Q|P] are simply given 
by a multinomial distribution with mean G[P], which in 
turn is the result of the action of the infinite population 
dynamics. This can be understood in the following way. 
In creating the population for the next generation indi- 
viduals are selected, copied, and mutated, M times from 
the same population P. This means that for each of the 
M selections there are equal probabilities that a string of 
fitness n will be produced in the next generation. For an 
infinite number of selections the final proportions Q n of 
strings with fitness n are just the probabilities to produce 
a single individual with fitness n with a single selection. 
That is, given an infinite population G[P] we have for 
a finite population that the fitness distribution Q is a 
random sample of size M of the distribution G[P]. 

Putting these observations together, if we write Q n = 
m n /M, with < m n < M being integers, we have: 

Pr[Q|P] = Ml J] A J— . ( 9 ) 

, Tfln- 

n=l 



We see that for any finite-population fitness distribution 
P the operator G still gives the GA's average dynamics 
over one time step, since the expected fitness distribution 
at the next time step is G[P]. Note that the compo- 
nents G„[P] need not be multiples of 1/M. Therefore 
the actual fitness distribution Q at the next time step is 
not G[P], but is instead one of the lattice points of the 
finite-population state space. Since the variance around 
the expected distribution G[P] is proportional to 1/M, 
Q is likely to be close to G[P]. 



8 



C. Epochal Dynamics 

For finite populations, the expected change (dP) in the 
fitness distribution over one generation is given by: 

(dP) = G[P] - P. (10) 

Assuming that some component (dP{) is much smaller 
than 1/M, the actual change in component Pj is likely to 
be dPt = for a long succession of generations. That is, 
if the size of the "flow" (dPj) in some direction i is much 
smaller than the lattice spacing (1/M) for the finite pop- 
ulation, we expect the fitness distribution to not change 
in direction (fitness) i. In Refs. J|6| and |5?J we showed 
that this is the mechanism that causes epochal dynamics 
for finite populations. More formally, each epoch n of the 
dynamics corresponds to the population being restricted 
to a region in the rj-dimcnsional lower-fitness subspace of 
fitnesses 1 to n of the macroscopic state space. Stasis oc- 
curs because the flow out of this subspace is much smaller 
than the finite-population induced lattice spacing. 

As Fig. H illustrates, each epoch in the average fitness 
is associated with a constant value of the best fitness in 
the population. More detailed experiments reveal that 
not only is (/) constant on average during the epochs, in 
fact the entire fitness distribution P is constant on av- 
erage during the epochs. We denote by P n the average 
fitness distribution during the generations when n is the 
highest fitness in the population. As was shown in Ref. 

, each epoch fitness distribution P n is the unique fixed 
point of the operator G restricted to the n-dimensional 
subspace of strings with 1 < / < n. That is, if G™ is 
the projection of the operator G onto the n-dimensional 
subspace of fitnesses from 1 up to n, then we have: 

G"[P"] = P". (11) 

The average fitness /„ in epoch n is then given by: 

n 

/«=2>y- as) 

Thus, the fitness distributions P n during epoch n are ob- 
tained by finding the fixed point of G restricted to the 
first n dimensions of the fitness distribution space. We 
will construct the operators G™ explicitly below for our 
GA and solve analytically for the epoch fitness distribu- 
tions P™ as a function of n, K, and q. 

The global dynamics can be viewed as an incremental 
discovery of successively more dimensions of the fitness 
distribution space. In most realistic settings, it is typi- 
cally the case that population sizes M are much smaller 
than 2 L . Initially, then, only strings of low fitness are 
present in the initial population, as can also be seen 
from Eq. ([?]). The population then stabilizes on the 
epoch fitness distribution P n corresponding to the best 
fitness n in the initial population. The fitness distribu- 
tion fluctuates around P n until a string of fitness n + 1 



is discovered and spreads through the population. The 
population then settles into fitness distribution P n+1 un- 
til a string of fitness n + 2 is discovered, and so on, until 
the global optimum at fitness TV + 1 is found. In this 
way, the global dynamics can be seen as stochastically 
hopping between the different epoch distributions P n . 

Whenever mutation creates a string of fitness n+1, this 
string may either disappear before it spreads (seen as the 
isolated jumps in best fitness in Fig. y) or it may spread, 
leading the population to fitness distribution P n+1 . We 
call the latter process an innovation. Fig. || also showed 
that it is possible for the population to fall from epoch 
n (say) down to epoch n — 1. This happens when, due 
to fluctuations, all individuals of fitness n are lost from 
the population. We refer to this as a destabilization of 
epoch n. For some parameter settings, such as shown in 
Figs. ||(a) and||(c), this is very rare. The time for the 
GA to reach the global optimum is mainly determined 
by the time it takes to discover strings of fitness n+1 
in each epoch n. For other parameter settings, however, 
such as in Figs. ||(b) and |^(d), the destabilizations play 
an important role in how the GA reaches the global op- 
timum. In these regimes, destabilization must be taken 
into account in calculating search times. This task is 
accomplished in the sequel p{J. 



D. Selection Operator 

We now explicitly construct the generation operator 
G for the limit of infinite population size by construct- 
ing the selection operator S and mutation operator M. 
First, we consider the effect of selection on the fitness 
distribution. Since we are using fitness-proportionate se- 
lection, the proportion P/ of strings with fitness i after 
selection is proportional to i and to the fraction Pj of 
strings with fitness i before selection; that is, 

= c * P< . (13) 

The constant c can be determined by demanding that the 
distribution remains normalized: 

n JV+1 

i = E^ = c E ?p - ( 14 ) 

i=l i=l 

Since the average fitness (/) of the population is given 
by: 

JV+l 

(/) = ]T iPi, (15) 

i=l 

we have 

P s = — . (16) 
1 if) { ' 



<) 



We can thus define a (diagonal) operator S that works 
on a fitness distribution P and produces the fitness dis- 
tribution P s after selection by: 



N+l 



(17) 



Notice that this operator is nonlinear since, by Eq. (15), 
the average fitness (/) is a function of the distribution P 
on which the operator acts. 



E. Mutation Operator 

The precise form of the mutation operator M depends 
on the value n of the current best fitness in the popula- 
tion. Thus, we have to construct the mutation operator 
M for each possible n. Assuming that strings of fitness n 
are the highest fitness strings in the current population 
we calculate the probabilities M,j that a string of fitness 
j is turned into a string with fitness i under mutation, 
where the indices i and j run from 1 to n. Notice that, 
in this way, we explicitly calculate the elements M^- re- 
stricted to the n-dimensional subspace of the fitness dis- 
tribution space. 

First, consider the components M^ with i < j. These 
strings are obtained if mutation leaves the first i — 1 blocks 
of the string unaltered and disrupts the ith block in the 
string. The effect of mutation on the blocks i + 1 through 
N is immaterial for this transition. Multiplying the prob- 
abilities that the preceding i — 1 blocks remain aligned 
and that the ith block becomes unaligned we have: 



M ij = (l-qf-V K (l-(l-q) K ), i<j 



(18) 



The diagonal components Mjj are obtained when mu- 
tation leaves the first j — 1 blocks unaltered and does 
not mutate the jth block to be aligned. The maximum 
entropy assumption says that the jth block is random 
and so the probability P a that mutation will change the 
unaligned jth block to an aligned block is given by: 



Pn 



1 ~ (1 ~ g) 

2 K - 1 



A" 



(19) 



This is the probability that at least one mutation will oc- 
cur in the block times the probability that the mutated 
block will be in the correct configuration. Thus the di- 
agonal components are given by: 



M, 



(1-9) 



1 



l-(l-g) 

2 K - 1 



A" 



(20) 



Finally, we calculate the probabilities for increasing- 
fitness transitions M,j with i > j. These transition 
probabilities depend on the states of blocks j through 
ri—l. One approximation, using the maximum entropy 
assumption, is obtained by assuming that all these blocks 



are random. The jth block is equally likely to be in any 
of 2 K — 1 unaligned configurations. All succeeding blocks 
are equally likely to be in any one of the 2 K configura- 
tions, including the aligned one. In order for a transition 
to occur from state j to i, first of all the j — 1 aligned 
blocks have to remain aligned, then the jth block has 
to become aligned through the mutation. The latter has 
probability P a . Furthermore, the following i—j — 1 blocks 
all have to be aligned. Finally, the ith block has to be 
unaligned. Putting these together, we find that: 



Mfc- = (1 - q) 



(i-i)if 



l~(l-g) J 
2 K -1 



i-j-l 



1 



1 
2~« 



I > j 



(21) 



As a technical aside, note that for the case i — n the last 
factor does not appear. Since the current highest fitness 
in the population is n, it is almost certain that the nth 
block is unaligned in all strin gs in the population. As 
we show below in section VIII , the reason for this is that 



all individuals in the population during epoch n are de- 
scendants of strings in the highest fitness class n. The 
strings in the highest fitness class n have their nth block 
unaligned by definition. If any string with fitness i < n 
has the nth block aligned, this block must have become 
aligned in the few number of generations after it's ap- 
pearance from a string of fitness n. Generally, this only 
occurs with very low probability and so can be ignored. 

Another approximation to the components My with 
i > j is obtained by assuming that all individuals with 
fitness j, for every j < n, are offspring of an individual 
with fitness n that had its jth block become unaligned 
through mutation. This means that a string of fitness j 
has n — 2 aligned blocks and one unaligned block, namely, 
the jth block. Mutations from j to i with i > j then only 
occur from j to i = n and do so by aligning the jth block: 



(1-g) 



(n-2)K 



l-(l-g) 

2 K - 1 



A 



(22) 



It turns out that both approximations give very similar 
results for observables such as epoch duration and total 
number of fitness evaluations to reach the global opti- 
mum. In fact, to a good approximation we can set all 
components My with i > j to zero, since these compo- 
nents involve the alignment of at least one block through 
mutation. For K not too small this occurs with much 
smaller probability than block destruction. That is, to a 
good approximation, we can neglect terms proportional 
to P a in the components M«. 

The restricted generation operator G n is now obtained 
by taking the product of the selection operator S with the 
mutation operator M: 



G" = M • S, 



(23) 



where the component indices of the mutation and selec- 
tion operators run from 1 to n. 
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VII. QUASISPECIES DISTRIBUTIONS AND 
EPOCH FITNESS LEVELS 



During epoch n the quasispecies fitness distribution P n 
is given by a fixed point of the operator G". To obtain 
this fixed point we linearize the generation operator by 
taking out the factor (/) , thereby defining a new operator 
G" via: 



G -Jf) G 



(24) 



The operator G" is just an ordinary (linear) matrix op- 
erator and the quasispecies fitness distribution is nothing 
other than the principal eigenvector of this matrix. The 
principal eigenvalue /„ of G™ is the average fitness of 
the quasispecies distribution. In this way, obtaining the 
quasispecies distribution reduces to calculating the prin- 
cipal eigenvector of the matrix G™. Again the reader is 
referred to Ref . J37) . 

As in Refs. [ [36[ | and [jjTj, the local stability of the 
epochs can be analyzed by calculating the eigenvalues 
and eigenvectors of the Jacobian matrix DG™ around 
each epoch fitness distribution P n . The components 
DG™ of the Jacobian around epoch n are given by: 



DG' 



dGj{P) 
dPi 



fn 



(25) 



Just as in Ref. pTy, the eigenvectors U l of the Jacobian 
are given by U % — P % — P n , with corresponding eigen- 
values e" = fi/ f n - Thus, the spectra of the Jacobian 
matrices are simply determined by the spectrum of the 
generation operator G. The eigenvalues e™ determine 
the bulk of the GA's behavior. Since the matrix G is 
generally of modest size, i.e. its dimension is determined 
by the number of blocks N, we can easily obtain numer- 
ical solutions for the epoch fitnesses f n and the epoch 
quasispecies distributions P n . At the same time one also 
obtains the eigenvalues e" and eigenvectors U l of the Ja- 
cobian. 

For a clearer understanding of the functional depen- 
dence of the epoch fitness distributions on the GA's pa- 
rameters, however, we will now develop analytical ap- 
proximations to the epoch fitness levels /„ and quasis- 
pecies distributions P n . 

In order to explicitly determine the form of the quasis- 
pecies distribution P n during epoch n we mu st ap 
imate the matrix G™. As we saw in section VIE 
components My (and so of G) naturally fall into three 
categories. Those with i < j, those with i > j, and those 
on the diagonal i = j. Components with i > j involve at 
least one block becoming aligned through mutation. As 
noted above, these terms are generally much smaller than 
the terms that only involve the destruction of aligned 
blocks or for which there is no change in the blocks. We 



therefore approximate G" by neglecting terms propor- 
tional to P a . Under this approximation for the compo- 
nents of G™, we have: 



G^=j(l-q)^ K (l-(l-q) K ), %<j, (26) 



and 



G" =j(l-q)V-» K . (27) 



The components with i > j are now all zero. The result 
is that G" is upper triangular. As is well known in gen- 
eral matrix theory, the eigenvalues of an upper triangular 
matrix are given by its diagonal components. Therefore, 
the average fitness f n in epoch n, which is given by the 
largest eigenvalue, is equal to the largest diagonal com- 
ponent G". That is, 



f n = n(l-q)^ K 



(28) 



Notice that the matrix elements only depend on q and K 
through the effective parameter A defined by: 



A = (l-<z) K 



(29) 



A is just the probability that a block will not be mutated. 

The principal eigenvector P n is the solution of the 
equation: 



3=1 



(e&-M,)jy = o 



(30) 



Since the components of G™ depend on A in such a di- 
rect way, we can analytically solve for this eigenvector; 
finding that the quasispecies components are given by: 



i-l 



= (1 - A)nA"~ 1 ~' W nA"-'-j 
11 n\ h • > 



n \n-l-i _ t AA nA , t -x-j _ r, 

i=i J 
For the component P™ this reduces to 



(31) 



P n = U 



n\ n - j - j 
nA™" 1 ^ - j 



(32) 



The above equation can be re-expressed in terms of the 
epoch fitness levels ff. 



n-l 



K = a- 1 n 



3=1 



fn fj 

fn " A/j 



(33) 



Note that it is straightforward to increase the accuracy 
of our analytical approximations by including terms pro- 
portional to P a in the matrix G™ and then treating these 
terms as a perturbation to the upper triangular matrix. 
Using standard perturbation theory, one can then obtain 
corrections due to block alignments. We will not pur- 
sue this here, however, since the current approximation 
is accurate enough for the optimization analysis. 
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VIII. QUASISPECIES GENEALOGIES AND 
CROSSOVER'S ROLE 

Before continuing on to solve this problem, we digress 
slightly at this point for two reasons. First, we claimed in 
a previous section that all individuals in the population 
during epoch n are descendants of strings with fitness 
n. We will demonstrate this now by considering the ge- 
nealogies of strings occurring in a quasispecies population 
P n . Second, since the argument is quite general and only 
depends on the effects of selection, it has important im- 
plications for population structure in metastable states 
(such as fitness epochs) and, more generally, for the role 
of crossover in evolutionary search. 

For the nth epoch, let the set of "suboptimal" strings 
be all those with fitness lower than n; their proportion is 
simply 1 — P™. This proportion is constant on average 
during epoch n. Over one generation, the suboptimal 
strings in the next generation will be either descendants 
of suboptimal strings in the current generation or mu- 
tant descendants of "optimal" strings with fitness n. Let 
r denote the average number of offspring per suboptimal 
individual. The fact that the total proportion of subop- 
timal strings remains constant gives us the equation: 



(1 - P£) = r(l - Pll) + 



(1-M ?m )n 

p n 
Jn 



(34) 



The last term is the proportion of individuals of fitness 
n that are selected and do not stay at fitness n under 
mutation. From this we can solve for r to find: 



1 



(1 - M nn )nPZ = 1 - A 1 ~"P^ 
/„(1-P«) 1-P" 



(35) 



where we have used the previous analytical approxima- 
tions to /„ and M„ n , Eqs. ( |28| ) and (po|), in the last 
equality. 

We see that in every generation only a fraction r of 
the suboptimal individuals are descendants from subop- 
timal individuals in the previous generation. This means 
that after t generations, only a fraction r* of the sub- 
optimal individuals are the terminations of lineages that 
solely consist of suboptimal strings. There is a fraction 



of 1 



strings that have one or more ancestors with 



fitness n in the t preceding generations. After a certain 
number of generations t c this fraction becomes so small 
that less than one individual (on average) has only sub- 
optimal ancestors. That is, after approximately t c gen- 
erations in epoch n, the whole quasispecies will consist 
of strings that are descendants of a string with fitness n 
somewhere in the past. Explicitly, we find that 



tc 



log[Af(l-P™)] 


log 




1-Pg 



(36) 



As expected t c is proportional to the logarithm of the 
total number M(l — P") of suboptimal strings in the 
quasispecies. 



The above result can be generalized to the case in 
which the suboptimal strings are defined to be all those 
with fitnesses 1 to n — i. One can then calculate the time 
until all strings in these classes are descendants of strings 
with fitness n — i + 1 to n to find that the lower classes 
are taken over even faster by descendants of strings with 
fitness n. 

The preceding result is significant for a conceptual un- 
derstanding of the structure of epoch populations. All 
suboptimal individuals in the population have an ances- 
tor of optimal fitness that is a relatively small number 
of generations in the recent past. The result is that in 
genotype space all suboptimal individuals are always rel- 
atively close to some individual of optimal fitness. The 
suboptimal individuals never wander far from the indi- 
viduals of optimal fitness. More precisely, the average 
length of suboptimal lineages is 1/(1 — r) generations. 
That is, all suboptimal individuals are to be found within 
an average Hamming distance of Lq/ (1— r) from optimal- 
fitness individuals. The individuals with optimal fitness, 
however, can wander through genotype space as long as 
they do not leave the neutral network of optimal fitness 
strings — those with fitness n in epoch n. If the popula- 
tion is to traverse large regions of genotype space in order 
to discover a string of fitness larger than n, it must do 
so along this neutral network. In short, this is the rea- 
son we believe the existence of neutral networks, consist- 
ing of approximately equal fitness strings that percolate 
through large parts of the genotype space, is so impor- 
tant for evolutionary search; cf. Ref. j2j|. If strings of 
fitness n were to form a small island in a sea of much 
lower fitness strings that are at relatively large Hamming 
distances from islands with fitness higher than n, then 
there is little chance that a suboptimal fitness mutant 
will drift far enough from the island of fitness n strings 
to discover another island with higher fitness. 

This result — that all strings in the metastable popula- 
tion are relatively recent descendants of strings in the 
highest fitness class — should generalize to other selec- 
tion schemes such as elite selection, rank selection, and 
tournament selection. Furthermore, this view also pro- 
vides some insight into the effects of adding crossover 
to evolutionary search algorithms. Assume that we add 
crossover to out current GA; see also the discussion of 
similar crossover experiments in section 6.5 of Ref. p7| . 
The initial population still has a distribution of fitnesses 
given by Eq. (0). The best fitness in the initial popu- 
lation might be (say) 3, corresponding to the first two 
blocks being aligned and the third block unaligned. It is 
unlikely that crossovers will lead quickly to strings of fit- 
ness 4. Although the initial population will have strings 
with the 3rd block aligned, these strings are very unlikely 
to also have the first block aligned. This means that these 
strings have low fitness and so are unlikely to be selected 
as the parent of a crossover event. Moreover, relatively 
quickly, the entire population will become descendants of 
strings with fitness 3 that, by definition, have the third 
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block unaligned. Crossover events will thus almost never 
lead to the creation of strings of higher fitness; at least 
not through the "combining of building blocks" (2lJ . 

The positive contribution of crossover is that an 
aligned block may be formed from two parents each with 
an unaligned 3rd block if the crossover point falls within 
the 3rd block and if the resulting complementary sub- 
blocks are themselves aligned. The negative effect is 
that crossover may also combine lower fitness strings 
with higher fitness strings so as to produce two lower 
fitness offspring. Thus, with nothing else said or added, 
we expect the effect of crossover used in the GA to be 
marginal. Experiments with single-point crossover con- 
firm this. The global optimum is found somewhat more 
quickly, but the improvement in search time is very small 
and often washed out by the lar ge v ariance in search time. 
We return to this issue in Ref. 35 1. 

These arguments are specific to the Royal Staircase 
(and also Royal Road) classes of fitness function. How- 
ever, for evolutionary dynamics exhibiting epochal evo- 
lution we believe it to be the case that the population 
structure is a cloud of strings localized around strings of 
the current best fitness. Therefore, the effect of crossover 
mostly will be to increase the amount of mixing within 
the quasispecies cloud. That is, crossover acts during the 
epochs as a local mixing operator very much as mutation 
does. 



IX. MUTATION RATE OPTIMIZATION 



probability C n +\ that a string of fitness n+1 will be 
created over one generation is given by: 



MP"P„ 



(38) 



Our claim is that, as a first approximation, maximizing 
C„+i minimizes the number of generations the popula- 
tion spends i n ep och n. 

In section VII we derived an analytic approximation 
to the proportion P™ of individuals in the highest fitness 
class during epoch n as a function of A = (1 — q) . Since 
P„ is also a monotonic function of A, as it is of q, we 
can maximize C„+i as a function of A instead. Ignoring 
proportionality constants, the function to maximize is: 



C n+1 <x (l-A)P n "(A) . 



(39) 



Using Eq. ( p2| ) for the dependence of P™ on A and dif- 
ferentiating the above function C„+i with respect to A, 
we find that the optimal A Q satisfies: 



n(l - A Q ) 
A 



iXl 



n-l 

y 



(i-l)At- 1 
nX l ~ 1 — n + 



(40) 



The solution of this equation gives the optimal X (n) 
which is only a function of the epoch number n. This 
is an important observation, since it means that the op- 
timal value of the mutation rate q Q takes the following 
general form as a function of n and K: 



In the previous sections we argued that the GA's be- 
havior can be viewed as stochastically hopping from 
epoch to epoch until the search discovers a string with 
fitness N + 1. Assuming for the moment that the total 
time to reach this global optimum is dominated by the 
time the GA spends in the epochs, we will derive a way 
to tune the mutation rate q such that the time the GA 
spends in an epoch is minimized. 

During epoch n no string in the population has the 
nth block aligned. Thus, the main contribution to the 
waiting time in epoch n is given by the time it takes a 
mutant with the nth block set correctly to appear. As 
we have seen, the probability P a to mutate a single block 
such that it becomes aligned is given by: 



1o 



1 



Pa = 



l-(l-g) 

2 K - 1 



K 



1 - A 

2 K - 1 ' 



(37) 



where again A is the probability that a block will not be 
mutated at all. Obviously, the higher P a , the more likely 
mutation is to produce a new aligned block. Every gener- 
ation each individual in the population has a probability 
P a of aligning the nth block. Aligning the nth block only 
creates a string of fitness n+1 when all the n — 1 blocks 
to its left are aligned as well. That is, only the fraction 
P£ of the population with fitness n will produce a fitness 
n+1 string by aligning the nth block. Therefore, the 



yxjnj . 



(41) 



Once we solve for the function A D (n), we can immediately 
obtain the dependence of q on K using Eq. (^l|). 

In this calculation we assumed that the waiting time 
for discovering a higher fitness string dominated the time 
spent in an epoch. This means that as soon as a string of 
fitness n+1 is created, copies of this string spread through 
the population and the population stabilizes onto epoch 
n+1. In fact, it is quite likely that the string with fitness 
n+1 will be lost through a deleterious mutation in the 
aligned blocks or via sampling before it gets a chance to 
establish itself in the population. Recall the transitory 
jumps in the best fitness seen in Fig. ||. In Ref. |37| we 
used a diffusion equation approximation to calculate the 
probability 7r„ that a string with fitness n+1 will spread. 
We found that to a good approximation it is given by: 



1 



where 



i - i - p; 



In 



/n+1 fn 



-2 7 „ 



fn 



(42) 



(43) 



and the last approximation in Eq. (|4^) holds for rela- 
tively large population sizes. Notice that the spreading 
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probability ir n only depends on the relative difference of 
the average fitness in epoch n + 1 and epoch n. Using 
Eq. (03) for /„ we hnd: 



1-1 

n 



(44) 



Thus, we find that 7r„(A) is approximately given by: 

Tr n (\) = I - e- 2 ( 1+ i) x + 2 . (45) 

The probability C' n+1 to create a string of fitness n + 1 
that spreads through the population is thus given by: 



C'n+1 ~ C n+ i7T ra (A) 



(46) 



Taking the spreading probability 7r„ into account, we 
want to maximize C' n+1 in order to minimize the time 
spent in epoch n. Note that, also in this case, the de- 
pendence on q and K enters only through A. For each 
n there an optimal value of A from which the optimal 
mutation rate can be determined as a function of K . 
The optimal value A D in this case is the solution of: 



n(l - A Q ) 
A„ 



'n-l 



(i - 1)A* 
nAo _1 — n 



2 (1 + 1 e - 2 ( 1+ ^) A °+ 2 

+ 1 - K)— — 7 = 1. 

1 _ e " 2 ( 1+ " A °+ 2 



(47) 



Numerically, the solution A (n) is well approximated by: 



1 



3n 1.175 ' 



(48) 



as shown in Fig. ||, which plots (1 — A Q ) as a function of 
n. The solid line is the numerical solution obtained from 
Eq. (f47|); the dashed line is the approximation Eq. (48). 
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FIG. 4. Optimal A as a function of n. The vertical axis 
shows 1 - Ao on a logarithmic scale. The horizontal axis plots 
n on a logarithmic scale. The solid line is the numerical so- 
lution of Eq. (MjJ) and the dashed line is the approximation 
given by Eq. (W8). 



For large n, using Eq. 
optimal mutation rate by: 



(Eg) we can approximate the 
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Thus, the optimal mutation rate drops as a power-law in 
both n and K. This implies that, generally for the Royal 
Staircase fitness function, the mutation rate should de- 
crease as a GA run progresses so that the search will 
find the global optimum as quickly as possible. We pur- 
sue this idea more precisely elsewhere by considering an 
adaptive mutation rate scheme for the GA. 



X. FITNESS FUNCTION EVALUATIONS: 
THEORY VERSUS EXPERIMENT 

In the preceding sections we derived an expression for 
the probability C' n+1 to create, over one generation in 
epoch n, a string of fitness n + 1 that will stabilize by 
spreading through the population. From this we now es- 
timate the total number E of fitness function evaluations 
the GA uses on average before an optimal string of fit- 
ness TV + 1 is found. As a first approximation, we assume 
that the GA visits all epochs, that the time spent in inno- 
vations between them is negligible, and that epochs are 
always stable. By epoch stability we mean it is highly un- 
likely that strings with the current highest fitness will dis- 
appear from the population through a fluctuation, once 
such strings begin to spread. These assumptions appear 
to hold for the parameters of Figs. ||(a) and 0(c). They 
may hold even for the parameters of Fig. |^(bj, but they 
probably do not for Fig. ||(d). For the parameters of 
Fig. ||(d), we see that the later epochs (n — 8, 9, and 
10) easily destabilize a number of times before the global 
optimum is found. We will develop a generalization that 
addresses this more complicated behavior in Ref. 

The average number T n of generations that the popu- 
lation spends in epoch n is simply 1/C T ' 1+1 , the inverse of 
the probability that a string of fitness n+1 will be discov- 
ered and spread through the population. For a popula- 
tion of size M, the number of fitness function evaluations 
per generation is M, so that the total number E n of fit- 
ness function evaluations in epoch n is given by T n M . 
More explicitly, we have from this and Eqs. (Bq) and 
@ that: 



E n — 



1 



(1 - A)P„"7r„ 



(50) 



This says that, given our approximations, the total num- 
ber of fitness function evaluations in each epoch is inde- 
pendent of the population size M. The epoch lengths, 
measured in generations, are inversely proportional to 
M, while the number of fitness function evaluations per 
generation is M. Substituting our analytical expressions 



for P™ and 7r„, Eqs. (B3) and (kt5h, we have 
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E n (X) 



2 K - 1 



(1-A) 1 



-2(l+i)A+2 
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n\ n - 1 
nX n ~ 



N 



(51) 



In Ref. |35j we use this to derive analytical scaling ex- 
pressions for the minimal number of function evaluations 
that the GA requires on average in epoch n as given by 
E n {\ ), where A is the optimal A for epoch n. 

The total number of fitness function evaluations E(X) 
to reach the global optimum is given by the sum of E n (X) 
over all epochs n from 1 to TV: 



^ (1-A)tt„(A) A* nX n 



n 



n A 



n—i— 1 



(52) 



The optimal mutation rate for an entire run is obtained 
by minimizing the above expression for E with respect 
to A. 

Figure || compares Eq. (^2j) to the number (solid lines) 
of function evaluations estimated from 250 GA runs for 
the four different settings of TV and K of Fig. |[ Each 
plot is a function of mutation rate q and is given for a 
set of different population sizes M. 
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FIG. 5. Comparison of experimental results (solid lines) and theoretical predictions (dashed lines) of the total number E 
of function evaluations to reach the global optimal as a function of the mutation rate q, for four different fitness functions as 
determined by TV and K. The parameters are the same as those used in Fig. |§|. Plot (a) has TV = 8 and K = 8, (b) has 
TV = 6 and K = 6, (c) has TV = 4 and K — 10, and (d) has TV = 10 and K = 5. The dashed lines give the theoretical 
predictions according to Eq. (^2|). The solid lines are the results of experiments with different population sizes M. Each data 
point on the solid lines gives E averaged over R = 250 GA runs. Each solid line shows the estimated E as a function of q for a 
constant population size M. For low mutation rates, the solid lines approximately overlap, indicating that E is approximately 
independent of M in this regime. For large mutation rates, the lower values of M have larger E than for larger M. Plot (a) 
shows, from left to right, population sizes M = 150, M = 200, M = 250, M = 300, and M = 350; plot (b) M = 60, M = 90, 
M = 120, M = 150, M = 180, and M = 210; plot (c) M = 50, M = 80, M = 110, M = 140, M = 170, and M = 200; and 
finally, plot (d) M = 100, M = 150, M = 200, M = 250, and M = 300. 
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Each of the four plots in Fig. g shows, as a clashed 
line, the population-size independent theoretical approx- 
imation of the total number E of fitness function evalu- 
ations as a function of q. The four parameter settings of 
N and K are the same as in the runs (a), (b), (c), and 
(d) of Fig. §. In each plot in Fig. | the solid lines show 
E as a function of q for several different population sizes 
as obtained from GA experiments. Each data point on 
the solid lines is an average over 250 GA runs. 

Figure |(a) has N = 8 blocks of K = 8 bits. The de- 
pendence of E on mutation rate is shown over the range 
from q = 0.001 to q = 0.012. The optimal mutation 
rate occurs somewhere around q = 0.006. Each solid 
line shows the experimental dependence of E on q for a 
fixed population size. At high mutation rates, the lower 
population sizes have higher E. At high mutation rates, 
the set of solid lines in Fig. || show, from left to right, 
E{q) for population sizes M = 150, M = 200, M = 250, 
M = 300, and M = 350, respectively. 

Figure ||(b) has parameters N — 6 and K — 6. E(q) 
is shown over the range q — 0.001 to q — 0.025. Again, 
at large mutation rates the smaller population sizes have 
higher E(q). The solid lines from top to bottom in the 
high mutation rate regime show E(q) for population sizes 
M = 60, M = 90, M = 120, M = 150, M = 180, and 
M = 210, respectively. The optimal mutation rate occurs 
somewhere around q — 0.012. 

Figure |(c) has N = 4 blocks of length K = 10 as pa- 
rameter settings. E(q) is shown over the range q = 0.001 
to q — 0.02, the optimum occurring around q = 0.011. 
The solid lines shows the experimental E(q) for popula- 
tion sizes M = 50, M = 80, M = 110, M = 140, and 
M = 200. 

Finally, Fig. |(d) has N = 10 blocks of length K = 5 
bits. The horizontal axis ranges from q — 0.001 to 
q = 0.016 with the optimal mutation rate occurring 
around q Q = 0.008. The population sizes are, from top 
to bottom at high q, M = 100, M = 150, M = 200, 
M = 250, and M = 300, respectively. Note that for fig- 
ures ||(b), ||(c), an d ||(d) the tick marks on the vertical 
axis have a scale of lCr fitness function evaluations, while 
Fig. |5|(a) uses a scale of 10 6 fitness function evaluations. 

Note that in obtaining the theoretical predictions we 
have set 7Tat = 1 in Eq. (f32"|), since by definition the 
GA stops at the first occurrence of the globally optimum 
string. For the last epoch N it is only necessary to cre- 
ate a string of fitness N + 1, it does not need to spread 
through the population. 

Figure || shows that for all these different parameter 



settings, 1 the theory, which is independent of population 
size M, reasonably predicts the location of the optimal 
mutation rate q Q . It also predicts moderately well the 
shape of the curve around the optimum. 

It is notable that the theory consistently underesti- 
mates E(q). Furthermore, it underestimates E(q) more 
in the low mutation rate regime than in the high. We 
believe that both of these offsets can be explained by the 
effects of finite-population sampling. We assumed that 
all unaligned blocks in members of the current highest 
fitness class are random and independent of each other. 
This last assumption does not hold in general |l(J. Due 
to finite-population sampling and the resulting tendency 
to fixate, strings in the highest fitness class are corre- 
lated with each other. This means that if one individual 
in the highest fitness class has it's nth unaligned block 
at a large Hamming distance from the desired configu- 
ration (with that block aligned), then many individuals 
(as descendants) in the highest fitness class also tend to 
have their nth blocks at large Hamming distances from 
the desired configuration. This genetic correlation among 
the individuals increases the epoch durations. Moreover, 
this effect is more severe for small mutation rates which, 
along with small population sizes, increase population 
convergence. 

It turns out that this effect is difficult to analyze quan- 
titatively. In spite of this it appears that, for low muta- 
tion rates on up to the optimal mutation rate, the total 
number E of fitness function evaluations is indeed ap- 
proximately independent of population size M. More- 
over, the theory still accurately predicts the location of 
the optimal mutation rate q . Also, although the exact 
magnitude of E is underestimated, the largest deviation 
in E from the experimental minimum is 47% for the pa- 
rameters of Fig. ^|(a), whereas the minimal deviation is 
37%, for the parameters of Fig. |5|(c). Thus, the theory 
predicts the correct order of magnitude for the minimal 
number of fitness function evaluations. 

As was noted above, the experimental curves for differ- 
ent population sizes appear to collapse onto each other 
in the low mutation rate regime. As the mutation rate 
increases, the solid curves break off this common curve 
one by one and do so delayed in proportion to population 
size. As the mutation rate increases it appears that pro- 
gressively larger population sizes are necessary to main- 
tain the search's efficiency. Of course, this effect is not 
explained by our population-size independent theory. It 
is the topic of the sequel ]35|] . 

From the point of view of GA practice, it is impor- 



1 Each case has strings of comparable numbers of blocks, be- 
tween N = 4 and = 10, and block lengths, between K = 5 
and K — 10. This is mainly done since it is computationally 
expensive, if not infeasible, to obtain extensive experimental 
data for substantially larger values of N and K. 
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tant to emphasize again that there is a large variance 
between GA runs (at fixed parameters) in the total num- 
ber of function evaluations to reach the global optimum. 
For example, if there is an average number of 10 5 fitness 
function evaluations, then the standard deviation of the 
total number of fitness function evaluations is also al- 
most 10 5 . Some runs may take as much as 5 x 10 5 fitness 
function evaluations (say) and some may only take 10 4 
evaluations. Therefore, if one intends to use a GA for 
only a few runs, or maybe even just once, there is a large 
range of q and M for which the performance of the GA 
looks statistically identical. In other words, the GA has 
a large "sweet spot" with respect to parameter settings 
for optimal search. Note that the optimal mutation rate 
is generally well below q = 1/L. Thus, despite the large 
parameter sweet spot, a mutation rate of q = 1/L is far 
too large for epochal evolutionary search. 

XI. CONCLUSION AND FUTURE ANALYSES 

In summary our findings are the following. 

1. At fixed population size M, there is a smooth cost 
surface E(q) as a function of mutation rate q. It 
has a single and shallow minimum q Q . 

2. The optimal mutation rate q roughly occurs in the 
regime where the highest epochs N — 1, N , and 
N + 1 are marginally stable; see Fig. ||. 

3. For lower mutation rates than q Q the total number 
of fitness function evaluations E(q) grows steadily 
and becomes almost independent of the population 
size M. 

4. Crossover's role in reducing search time is marginal 
due to population convergence during the epochs. 

5. There is an mutational error threshold in q that 
bounds the upper limit of the GA's efficient search 
regime. Above the threshold, which is population- 
size independent, suboptimal strings of fitness N 
cannot stabilize in the population. 

6. The surface E(q) appears to be everywhere con- 
cave. That is, for any two parameters q\ and q2, the 
straight line connecting these two points is every- 
where above the surface E(q). We conjecture that 
this is always the case for mutation-only genetic 
algorithms with a static fitness function. This fea- 
ture is useful in that a steepest-descent algorithm 
applied to parameter q will lead to the unique op- 
timum q a . 

Synopsizing the results this way, we are anticipating some 
of the results developed for the population-size dependent 
theory ||. 

In this sequel we extend the above statistical dynamics 
analysis to account for £7(<?)'s dependence on population 



size. This not only improves the parameter-optimization 
theory, but also leads us to consider a number of issues 
and mechanisms that shed additional light on how GAs 
work and on those types of search problems (fitness func- 
tions) for which evolutionary search is and is not well 
suited. Since it appears that optimal parameter settings 
often lead the GA to run in a mode were the population 
dynamics is marginally stable in the higher epochs, we 
consider how epoch destabilization affects epoch stability 
and duration. We also analyze an adaptive evolutionary 
search algorithm in which the mutation rate and popu- 
lation size change dynamically as successive epochs are 
encountered. This leads to a reduction in search time 
and in resource requirements. 
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